Skip to content

fix: preserve timestamp precision when coercing mixed time units#22759

Merged
Dandandan merged 1 commit into
apache:mainfrom
fengys1996:fix/timestamp-comparison-finer-unit
Jun 6, 2026
Merged

fix: preserve timestamp precision when coercing mixed time units#22759
Dandandan merged 1 commit into
apache:mainfrom
fengys1996:fix/timestamp-comparison-finer-unit

Conversation

@fengys1996
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

DataFusion coerced Timestamp values with different TimeUnits to the coarser unit which result in a loss of precision and lead to incorrect comparison semantics

What changes are included in this PR?

  • Change timestamp time unit coercion to prefer the finer unit.
  • Add overflow checks for timestamp-to-timestamp casts when converting to a finer unit.

Are these changes tested?

Yes. Added unit tests for timestamp coercion and overflow checks, plus sqllogictest coverage for comparison, subtraction, UNION, and COALESCE.

Are there any user-facing changes?

Yes. Timestamp operations with mixed time units now coerce to the finer unit. If that conversion would overflow i64, the query returns an overflow error.

@github-actions github-actions Bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jun 4, 2026
@fengys1996 fengys1996 changed the title fix: timestamp comparisons to coerce to finer unit fix: preserve timestamp precision when coercing mixed time units Jun 4, 2026
@fengys1996
Copy link
Copy Markdown
Contributor Author

@alamb ptal if you have time! Thank you.

@Jefffrey
Copy link
Copy Markdown
Contributor

Jefffrey commented Jun 5, 2026

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 5, 2026

is this similar to

I think they are related, but not quite the same. Specifically #21908 I think is for the unwrap in cast optimization (e.g. '2025-12-25' = CAST(column as date) and moving the cast to the constant

I think this is about coercing types when comparing across unions and comparisons

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @fengys1996 and @Jefffrey

}

fn date_scalar_value_as_i64(&self) -> Option<i64> {
fn temporal_scalar_value_as_i64(&self) -> Option<i64> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to keep treating date and timestamp separately for consistency with the rest of the code. For example keep date_scalar_value_as_i64 and add timestamp_scalar_value_as_i64

But this looks fine too

fn timeunit_coercion(lhs_unit: &TimeUnit, rhs_unit: &TimeUnit) -> TimeUnit {
use arrow::datatypes::TimeUnit::*;
match (lhs_unit, rhs_unit) {
(Second, Millisecond) => Second,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is an example where the old code would lose data (it would convert millisecond precision to second precision)

@Dandandan Dandandan added this pull request to the merge queue Jun 6, 2026
Merged via the queue into apache:main with commit 51673d0 Jun 6, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect timestamp comparison with mixed time units

4 participants